Soft Error Modeling and Analysis for Microprocessors
نویسندگان
چکیده
Soft errors are a growing concern for processor reliability. Recent work has motivated architecture level studies of soft errors since the architecture level can mask many raw errors and architectural solutions can exploit workload knowledge. My dissertation focuses on the modeling and analysis of soft error issues at the architecture level. We start with the widely used method for estimating the architecture level mean time to failure (MTTF) due to soft errors. The method first calculates the failure rate for an architecture level component as the product of its raw error rate and an architecture vulnerability factor (AVF). Next, the method calculates the system failure rate as the sum of the failure rates (SOFR) of all components, and the system MTTF as the reciprocal of this failure rate. Both steps make significant assumptions. We analyze the validity of the two steps using both mathematical analysis and experiments. We find that although the AVF+SOFR method is valid for most current systems under current raw error rates, for some cases it can lead to significant discrepancies. We explore scenarios in which such discrepancies could occur in practice. To find an alternative model that is not subject to such limitations, we propose a model and tool called SoftArch that does not make the above AVF+SOFR assumptions. SoftArch is based on a probabilistic model of error generation and propagation process in a processor. Our experiments show that SoftArch does not exhibit the discrepancies the AVF+SOFR suffered. We apply SoftArch to an out-of-order processor running SPEC2000 benchmarks. Our results motivate selective and dynamic architecture level soft error protection schemes. Next, as another application, we quantify the impact of technology scaling on the processor soft error rate, taking the architecture level masking and workload characteristics into consideration. By using the SoftArch tool, we observe that there is much architecture level masking and that the degree of such masking can vary significantly across workloads, individual units, and workload
منابع مشابه
Soft Tissue Modeling Using ANFIS for Training Diagnosis of Breast Cancer in Haptic Simulator
Soft tissue modeling for the creation of a haptic simulator for training medical skills has been the focus of many attempts up to now. In soft tissue modeling the most important parameter considered is its being real-time, as well as its accuracy and sensitivity. In this paper, ANFIS approach is used to present a nonlinear model for soft tissue. The required data for training the neuro-fuzzy mo...
متن کاملA Field Failure Analysis of Microprocessors used in Information Systems
Soft errors due to cosmic particles are a growing reliability threat to information systems. In this work, a methodology is developed to analyze the effects of single event upsets (SEU) and obtain Failure In Time (FIT) rates for commercial server microprocessors in live information systems. Our methodology is based on data collected from error logs and error traces of the microprocessors collec...
متن کاملA Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor
Current trends in device scaling continue to cause an increasing risk of transient faults in microprocessors due to high energy strikes from radiated particles. In this work, we present a thorough microarchitectural analysis of the effects of soft errors on a production-level Verilog implementation of an ARM926EJ-S core. We examine the propagation of faults occurring in both sequential state el...
متن کاملCost Effective Soft Error Mitigation in Microprocessors
Device scaling has caused the challenges that processor designers face to evolve significantly in the past. This trend will continue into the future, and reliability is emerging as a significant challenge. In this work, we focus on one aspect of the reliability problem: soft errors. In particular, cost effective mitigation of soft errors in processor microarchitecture. Our investigation begins ...
متن کاملSoft error tolerant Content Addressable Memories (CAMs) using error detection codes and duplication
Soft Errors are becoming a major concern for modern computing systems. Memories are one of the elements affected by soft errors, which cause bitflips in some of the cells. A number of techniques such as the use of Error Correction Codes (ECCs), interleaving or scrubbing are utilized to mitigate the effects of soft errors on memories. Content Addressable Memories (CAMs) pose additional challenge...
متن کامل